Data Cleaning

Data Cleaning

Writer
Affiliation
Javier Silva-Valencia

Instituut Voor Tropische Geneeskunde. Antwerp-Belgium

Published

2023-02-28

Abstract
We are going to follow the plan made in The Analysis Plan to do the data cleaning


Data Cleaning

1. Reshape Q1_B_1

This dataset contains data of animals in household. We will reshape it so that each row should be a Household. We do that in order to merge with other datasets. Replace values if need it. (NA, -1)

#Use the dataset with the short name variables we worked before and save
Q1_B_1 = readRDS("Q1_B_1.RDS")

1.1 Reshaping from long to Wide

We need one line for each household.

So, we will use reshape command to say: “Reshape the database taking in to account that i want to make the database wider, having just one line for each household (FSN) and replicating the variables count, dist, indor, daysin for every animal that the household had”

animals <- reshape(Q1_B_1,direction = "wide", timevar ="anim", idvar ="FSN", v.names = c("count","dist","indor", "daysin"), sep = "_")
str(animals)
'data.frame':   7147 obs. of  29 variables:
 $ FSN       : int  45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
 $ count_Goa : int  3 NA NA 2 NA 7 1 3 3 NA ...
 $ dist_Goa  : int  5 NA NA 5 NA 0 5 15 0 NA ...
 $ indor_Goa : int  1 NA NA 1 NA 1 1 1 1 NA ...
 $ daysin_Goa: int  120 NA NA 180 NA 360 150 90 90 NA ...
 $ count_Pou : int  1 NA NA NA NA NA NA NA NA 2 ...
 $ dist_Pou  : int  0 NA NA NA NA NA NA NA NA 0 ...
 $ indor_Pou : int  1 NA NA NA NA NA NA NA NA 1 ...
 $ daysin_Pou: int  365 NA NA NA NA NA NA NA NA 360 ...
 $ count_Buf : int  NA 1 NA NA NA NA NA NA NA NA ...
 $ dist_Buf  : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf : int  NA 0 NA NA NA NA NA NA NA NA ...
 $ daysin_Buf: int  NA -1 NA NA NA NA NA NA NA NA ...
 $ count_Cow : int  NA NA 1 NA 2 NA 3 NA NA NA ...
 $ dist_Cow  : int  NA NA 3 NA 0 NA 15 NA NA NA ...
 $ indor_Cow : int  NA NA 1 NA 1 NA 0 NA NA NA ...
 $ daysin_Cow: int  NA NA 210 NA 360 NA -1 NA NA NA ...
 $ count_Pig : int  NA NA NA NA NA NA NA NA NA NA ...
 $ dist_Pig  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig : int  NA NA NA NA NA NA NA NA NA NA ...
 $ daysin_Pig: int  NA NA NA NA NA NA NA NA NA NA ...
 $ count_Dog : int  NA NA NA NA NA NA NA NA NA NA ...
 $ dist_Dog  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog : int  NA NA NA NA NA NA NA NA NA NA ...
 $ daysin_Dog: int  NA NA NA NA NA NA NA NA NA NA ...
 $ count_Oth : int  NA NA NA NA NA NA NA NA NA NA ...
 $ dist_Oth  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth : int  NA NA NA NA NA NA NA NA NA NA ...
 $ daysin_Oth: int  NA NA NA NA NA NA NA NA NA NA ...
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
  ..$ timevar: chr "anim"
  ..$ idvar  : chr "FSN"
  ..$ times  : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
  ..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...
class(animals)
[1] "data.frame"
Tip

We observe:

  • That now we have less observations (7147)(before was 9029 obs)

  • There are a lot of NA (households than doesn’t have that animal)

  • There are also some -1 values

1.2 Replacing innnecesary values

We want to replace the NA and the -1 to Zero in all columns

names(animals)
 [1] "FSN"        "count_Goa"  "dist_Goa"   "indor_Goa"  "daysin_Goa"
 [6] "count_Pou"  "dist_Pou"   "indor_Pou"  "daysin_Pou" "count_Buf" 
[11] "dist_Buf"   "indor_Buf"  "daysin_Buf" "count_Cow"  "dist_Cow"  
[16] "indor_Cow"  "daysin_Cow" "count_Pig"  "dist_Pig"   "indor_Pig" 
[21] "daysin_Pig" "count_Dog"  "dist_Dog"   "indor_Dog"  "daysin_Dog"
[26] "count_Oth"  "dist_Oth"   "indor_Oth"  "daysin_Oth"
#str(animals)
#View(animals)

animals$indor_Pou[is.na(animals$indor_Pou)] <- 0
animals$indor_Pig[is.na(animals$indor_Pig)] <- 0
animals$indor_Oth[is.na(animals$indor_Oth)] <- 0
animals$indor_Goa[is.na(animals$indor_Goa)] <- 0
animals$indor_Dog[is.na(animals$indor_Dog)] <- 0
animals$indor_Cow[is.na(animals$indor_Cow)] <- 0
animals$indor_Buf[is.na(animals$indor_Buf)] <- 0

animals$daysin_Pou[is.na(animals$daysin_Pou) | animals$daysin_Pou == -1] <- 0
animals$daysin_Pig[is.na(animals$daysin_Pig) | animals$daysin_Pig == -1] <- 0
animals$daysin_Oth[is.na(animals$daysin_Oth) | animals$daysin_Oth == -1] <- 0
animals$daysin_Goa[is.na(animals$daysin_Goa) | animals$daysin_Goa == -1] <- 0
animals$daysin_Dog[is.na(animals$daysin_Dog) | animals$daysin_Dog == -1] <- 0
animals$daysin_Cow[is.na(animals$daysin_Cow) | animals$daysin_Cow == -1] <- 0
animals$daysin_Buf[is.na(animals$daysin_Buf) | animals$daysin_Buf == -1] <- 0

animals$count_Pou[is.na(animals$count_Pou)| animals$count_Pou == -1] <- 0
animals$count_Pig[is.na(animals$count_Pig)| animals$count_Pig == -1] <- 0
animals$count_Oth[is.na(animals$count_Oth)| animals$count_Oth == -1] <- 0
animals$count_Goa[is.na(animals$count_Goa)| animals$count_Goa == -1] <- 0
animals$count_Dog[is.na(animals$count_Dog)| animals$count_Dog == -1] <- 0
animals$count_Cow[is.na(animals$count_Cow)| animals$count_Cow == -1] <- 0
animals$count_Buf[is.na(animals$count_Buf)| animals$count_Buf == -1] <- 0

str(animals)
'data.frame':   7147 obs. of  29 variables:
 $ FSN       : int  45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
 $ count_Goa : num  3 0 0 2 0 7 1 3 3 0 ...
 $ dist_Goa  : int  5 NA NA 5 NA 0 5 15 0 NA ...
 $ indor_Goa : num  1 0 0 1 0 1 1 1 1 0 ...
 $ daysin_Goa: num  120 0 0 180 0 360 150 90 90 0 ...
 $ count_Pou : num  1 0 0 0 0 0 0 0 0 2 ...
 $ dist_Pou  : int  0 NA NA NA NA NA NA NA NA 0 ...
 $ indor_Pou : num  1 0 0 0 0 0 0 0 0 1 ...
 $ daysin_Pou: num  365 0 0 0 0 0 0 0 0 360 ...
 $ count_Buf : num  0 1 0 0 0 0 0 0 0 0 ...
 $ dist_Buf  : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Buf: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Cow : num  0 0 1 0 2 0 3 0 0 0 ...
 $ dist_Cow  : int  NA NA 3 NA 0 NA 15 NA NA NA ...
 $ indor_Cow : num  0 0 1 0 1 0 0 0 0 0 ...
 $ daysin_Cow: num  0 0 210 0 360 0 0 0 0 0 ...
 $ count_Pig : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Pig  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Pig: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Dog : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Dog  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Dog: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Oth : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Oth  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Oth: num  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
  ..$ timevar: chr "anim"
  ..$ idvar  : chr "FSN"
  ..$ times  : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
  ..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...

2. Create an asset index

This index should include information of consumer goods, as well if having a bovine animal and if having a brick wall household.

2.1. Select consumer goods

Select the consumer goods (from Q1_B_106 dataset) that will conform the asset.

2.1.1. Importing the CSV database under the name of Q1_B_106, with “,” as separator, and with “.” as decimal:

Q1_B_106 <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_B_106.csv", sep=",", dec= ".")

2.1.2. Seeing the structure and values of the dataset

str(Q1_B_106)
'data.frame':   13377 obs. of  25 variables:
 $ FSN             : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ Radio           : int  1 1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ CD_Player       : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ BW_Television   : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Color_Television: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Video_DVD_Player: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Mobile          : int  -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
 $ Non_Mobile_Phone: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Refrigerator    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Iron            : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Sewing_Machine  : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Watch           : int  -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
 $ Pressure_Cooker : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Chairs          : int  1 -1 -1 -1 1 1 -1 -1 2 -1 ...
 $ Sofas           : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Tables          : int  -1 -1 -1 -1 -1 1 -1 -1 -1 -1 ...
 $ Cot_Bed         : int  2 2 3 -1 2 1 2 1 2 -1 ...
 $ Cupboards       : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Bicycle         : int  1 1 1 -1 -1 2 -1 -1 1 -1 ...
 $ Motor_Cycle     : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Animal_Draw_Cart: int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Car             : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Tractor         : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Computer        : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Electric_Fan    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
Tip

According to dictionary

-1 : Missing or not available

0, 1, 2, 3 : Number of items they have

2.1.3. Checking the summary of the variables

summary(Q1_B_106)
      FSN            Radio           CD_Player       BW_Television    
 Min.   :45001   Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000  
 1st Qu.:48527   1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000  
 Median :52032   Median :-1.0000   Median :-1.0000   Median :-1.0000  
 Mean   :52110   Mean   :-0.7555   Mean   :-0.9517   Mean   :-0.9015  
 3rd Qu.:55671   3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.:-1.0000  
 Max.   :59606   Max.   : 6.0000   Max.   : 3.0000   Max.   : 2.0000  
 Color_Television  Video_DVD_Player      Mobile         Non_Mobile_Phone 
 Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.00000   Min.   :-1.0000  
 1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.00000   1st Qu.:-1.0000  
 Median :-1.0000   Median :-1.0000   Median :-1.00000   Median :-1.0000  
 Mean   :-0.9217   Mean   :-0.9785   Mean   : 0.01323   Mean   :-0.9885  
 3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.: 1.00000   3rd Qu.:-1.0000  
 Max.   : 2.0000   Max.   : 1.0000   Max.   :10.00000   Max.   : 1.0000  
  Refrigerator          Iron         Sewing_Machine        Watch        
 Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000  
 1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000  
 Median :-1.0000   Median :-1.0000   Median :-1.0000   Median :-1.0000  
 Mean   :-0.9937   Mean   :-0.9415   Mean   :-0.8969   Mean   :-0.3998  
 3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.: 1.0000  
 Max.   : 1.0000   Max.   : 4.0000   Max.   : 3.0000   Max.   :12.0000  
 Pressure_Cooker       Chairs            Sofas             Tables       
 Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000  
 1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000  
 Median :-1.0000   Median :-1.0000   Median :-1.0000   Median :-1.0000  
 Mean   :-0.8435   Mean   : 0.4572   Mean   :-0.9602   Mean   :-0.2501  
 3rd Qu.:-1.0000   3rd Qu.: 2.0000   3rd Qu.:-1.0000   3rd Qu.: 1.0000  
 Max.   : 5.0000   Max.   :40.0000   Max.   : 8.0000   Max.   :10.0000  
    Cot_Bed         Cupboards          Bicycle          Motor_Cycle     
 Min.   :-1.000   Min.   :-1.0000   Min.   :-1.00000   Min.   :-1.0000  
 1st Qu.: 2.000   1st Qu.:-1.0000   1st Qu.:-1.00000   1st Qu.:-1.0000  
 Median : 2.000   Median :-1.0000   Median : 1.00000   Median :-1.0000  
 Mean   : 2.599   Mean   :-0.9624   Mean   : 0.09793   Mean   :-0.8656  
 3rd Qu.: 3.000   3rd Qu.:-1.0000   3rd Qu.: 1.00000   3rd Qu.:-1.0000  
 Max.   :24.000   Max.   : 7.0000   Max.   : 6.00000   Max.   : 5.0000  
 Animal_Draw_Cart       Car             Tractor           Computer      
 Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000   Min.   :-1.0000  
 1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000   1st Qu.:-1.0000  
 Median :-1.0000   Median :-1.0000   Median :-1.0000   Median :-1.0000  
 Mean   :-0.9919   Mean   :-0.9938   Mean   :-0.9874   Mean   :-0.9982  
 3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.:-1.0000   3rd Qu.:-1.0000  
 Max.   : 1.0000   Max.   : 2.0000   Max.   : 3.0000   Max.   : 1.0000  
  Electric_Fan    
 Min.   :-1.0000  
 1st Qu.:-1.0000  
 Median :-1.0000  
 Mean   :-0.6872  
 3rd Qu.:-1.0000  
 Max.   :11.0000  
Tip

Example of what we can see: The maximum number of radios in a household is 6

2.1.4. Transform to dichotomous variables

We need that all ‘-1’ be converting to ‘0’.

and all value more than ‘≥ 1’ be converting to ‘1’

#We can do it by using "if else":
Q1_B_106$own_Radio <- ifelse(Q1_B_106$Radio>0,1,0)


#But other way is: 
#Q1_B_106$own_Radio <- as.numeric(Q1_B_106$Radio > 0)
#Ask to create a variable called "own_Radio" only TRUE when "Radio" is more than 0 and converting that into a number (as.numeric) (1:true,0:false)


#Checking if it works
table(Q1_B_106$own_Radio, Q1_B_106$Radio, useNA = "always")
      
          -1     1     2     3     6  <NA>
  0    11753     0     0     0     0     0
  1        0  1608    12     3     1     0
  <NA>     0     0     0     0     0     0
summary(Q1_B_106$own_Radio)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.1214  0.0000  1.0000 
Tip

As a result we can see that: 12.14% of households have at least one radio. Remember: When we only have 0 and 1, The mean of the variable is the proportion of individuals having the exposure.

We need to do that for every variable

Q1_B_106$own_Radio <- ifelse(Q1_B_106$Radio>0,1,0)
Q1_B_106$own_CD_Player <- ifelse(Q1_B_106$CD_Player>0,1,0)
Q1_B_106$own_BW_Television <- ifelse(Q1_B_106$BW_Television>0,1,0)
Q1_B_106$own_Color_Television <- ifelse(Q1_B_106$Color_Television>0,1,0)
Q1_B_106$own_Video_DVD_Player <- ifelse(Q1_B_106$Video_DVD_Player>0,1,0)
Q1_B_106$own_Mobile <- ifelse(Q1_B_106$Mobile>0,1,0)
Q1_B_106$own_Non_Mobile_Phone <- ifelse(Q1_B_106$Non_Mobile_Phone>0,1,0)
Q1_B_106$own_Refrigerator <- ifelse(Q1_B_106$Refrigerator>0,1,0)
Q1_B_106$own_Iron <- ifelse(Q1_B_106$Iron>0,1,0)
Q1_B_106$own_Sewing_Machine <- ifelse(Q1_B_106$Sewing_Machine>0,1,0)
Q1_B_106$own_Watch <- ifelse(Q1_B_106$Watch>0,1,0)
Q1_B_106$own_Pressure_Cooker <- ifelse(Q1_B_106$Pressure_Cooker>0,1,0)
Q1_B_106$own_Chairs <- ifelse(Q1_B_106$Chairs>0,1,0)
Q1_B_106$own_Sofas <- ifelse(Q1_B_106$Sofas>0,1,0)
Q1_B_106$own_Tables <- ifelse(Q1_B_106$Tables>0,1,0)
Q1_B_106$own_Cot_Bed <- ifelse(Q1_B_106$Cot_Bed>0,1,0)
Q1_B_106$own_Cupboards <- ifelse(Q1_B_106$Cupboards>0,1,0)
Q1_B_106$own_Bicycle <- ifelse(Q1_B_106$Bicycle>0,1,0)
Q1_B_106$own_Motor_Cycle <- ifelse(Q1_B_106$Motor_Cycle>0,1,0)
Q1_B_106$own_Animal_Draw_Cart <- ifelse(Q1_B_106$Animal_Draw_Cart>0,1,0)
Q1_B_106$own_Car <- ifelse(Q1_B_106$Car>0,1,0)
Q1_B_106$own_Tractor <- ifelse(Q1_B_106$Tractor>0,1,0)
Q1_B_106$own_Computer <- ifelse(Q1_B_106$Computer>0,1,0)
Q1_B_106$own_Electric_Fan <- ifelse(Q1_B_106$Electric_Fan>0,1,0)

2.1.5. Now we will make a subset containing only the ‘own’ variables and FSN

assets <- subset(Q1_B_106, select = grepl("own|FSN", names(Q1_B_106)))
str(assets)  
'data.frame':   13377 obs. of  25 variables:
 $ FSN                 : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ own_Radio           : num  1 1 0 0 0 0 0 0 0 0 ...
 $ own_CD_Player       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_BW_Television   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Color_Television: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Video_DVD_Player: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Mobile          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Non_Mobile_Phone: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Refrigerator    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Iron            : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Sewing_Machine  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Watch           : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Pressure_Cooker : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Chairs          : num  1 0 0 0 1 1 0 0 1 0 ...
 $ own_Sofas           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Tables          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Cot_Bed         : num  1 1 1 0 1 1 1 1 1 0 ...
 $ own_Cupboards       : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Bicycle         : num  1 1 1 0 0 1 0 0 1 0 ...
 $ own_Motor_Cycle     : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Animal_Draw_Cart: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Car             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Tractor         : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Computer        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Electric_Fan    : num  0 0 0 0 0 0 0 0 0 0 ...

2.1.6. Now we will focus only in those variables whose mean have a value between 5-95% (0.05-0.95)

why?: Because We are going to exclude the variables that nobody has (less than 5%) and the variables that all people has (more than 95%)

#summary(assets)
round(sapply(assets, FUN=mean),3)   #To display the mean of each variable with 3 decimals
                 FSN            own_Radio        own_CD_Player 
           52110.000                0.121                0.024 
   own_BW_Television own_Color_Television own_Video_DVD_Player 
               0.049                0.039                0.011 
          own_Mobile own_Non_Mobile_Phone     own_Refrigerator 
               0.476                0.006                0.003 
            own_Iron   own_Sewing_Machine            own_Watch 
               0.028                0.051                0.264 
 own_Pressure_Cooker           own_Chairs            own_Sofas 
               0.072                0.429                0.017 
          own_Tables          own_Cot_Bed        own_Cupboards 
               0.329                0.968                0.013 
         own_Bicycle      own_Motor_Cycle own_Animal_Draw_Cart 
               0.524                0.065                0.004 
             own_Car          own_Tractor         own_Computer 
               0.003                0.006                0.001 
    own_Electric_Fan 
               0.110 
Tip

Besides FSN, 10 variables have a value between 5-95% (0.05-0.95)

2.1.7. We are going to create a new subset (assets2) with only in those variables whose mean have a value between 5-95% (0.05-0.95)

assets2 <- subset(assets, select = c(FSN, own_Radio,own_Mobile, own_Sewing_Machine, own_Watch, own_Pressure_Cooker, own_Chairs, own_Tables, own_Bicycle, own_Motor_Cycle, own_Electric_Fan))
str(assets2)
'data.frame':   13377 obs. of  11 variables:
 $ FSN                : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ own_Radio          : num  1 1 0 0 0 0 0 0 0 0 ...
 $ own_Mobile         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Sewing_Machine : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Watch          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Pressure_Cooker: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Chairs         : num  1 0 0 0 1 1 0 0 1 0 ...
 $ own_Tables         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Bicycle        : num  1 1 1 0 0 1 0 0 1 0 ...
 $ own_Motor_Cycle    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Electric_Fan   : num  0 0 0 0 0 0 0 0 0 0 ...

2.2. Create own_bovine variable

Use the previously created “animals” dataset to create a variable of owning a bovine animal.

The researchers noticed that having a “bovines animals” variable is important, so we need to incorporate it. The only problem is that the animals information is in another dataset.

‘own_bov’ for each household = whether or not it owns cows or buffaloes

hlp1 <- subset(animals, select=c(FSN, count_Cow, count_Buf))
hlp1$own_bov <- NA
hlp1$own_bov[hlp1$count_Cow== 0 & hlp1$count_Buf==0] <- 0
hlp1$own_bov[hlp1$count_Cow> 0 | hlp1$count_Buf>0] <- 1 
str(hlp1)
'data.frame':   7147 obs. of  4 variables:
 $ FSN      : int  45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
 $ count_Cow: num  0 0 1 0 2 0 3 0 0 0 ...
 $ count_Buf: num  0 1 0 0 0 0 0 0 0 0 ...
 $ own_bov  : num  0 1 1 0 1 0 1 0 0 0 ...

2.3. Create brickwall variable

Use the Q1_B dataset to create a variable if the household have brickwalls.

The researchers also noticed that having a “brick_wall” variable is important, so we need to incorporate it. The only problem is that the brick_wall information is in another dataset.

2.3.1. Opening Q1_B dataset

Q1_B <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_B.csv", sep=",", dec= ".")

2.3.2. Checking the variable Wall_Material in the Q1_B dataset

table(Q1_B$Wall_Material)

   6  162  163  164  165  166 
   4 4498  169 5791 2914    1 
Tip

According the code list:

6= Other
163= Grass
163= Bamboo
164, 165, 166= Brick

So we only need those who had the value 164 or 165 in the Wall_Material variable

2.3.3. I only need “FSN” and “WallMaterial” variables

hlp2 <- subset(Q1_B, select = c(FSN, Wall_Material))

2.3.4. Creating a new variable called brickwall according to what I need

hlp2$brick_wall <- NA
hlp2$brick_wall[hlp2$Wall_Material %in% c(6,162,163)] <- 0
hlp2$brick_wall[hlp2$Wall_Material %in% c(164:166)] <- 1

table(hlp2$brick_wall, Q1_B$Wall_Material)
   
       6  162  163  164  165  166
  0    4 4498  169    0    0    0
  1    0    0    0 5791 2914    1

2.4. Merging datasets

Merge all this datasets in order to create the asset index

Merging planning

2.4.1. Merging hlp1 and hlp2

str(hlp1)
'data.frame':   7147 obs. of  4 variables:
 $ FSN      : int  45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
 $ count_Cow: num  0 0 1 0 2 0 3 0 0 0 ...
 $ count_Buf: num  0 1 0 0 0 0 0 0 0 0 ...
 $ own_bov  : num  0 1 1 0 1 0 1 0 0 0 ...
str(hlp2)
'data.frame':   13377 obs. of  3 variables:
 $ FSN          : num  45001 45002 45003 45004 45005 ...
 $ Wall_Material: num  164 164 164 162 163 163 162 164 164 162 ...
 $ brick_wall   : num  1 1 1 0 0 0 0 1 1 0 ...
hlp  <-  merge(hlp2, hlp1, all=TRUE, by = "FSN")
str(hlp)
'data.frame':   13377 obs. of  6 variables:
 $ FSN          : num  45001 45002 45003 45004 45005 ...
 $ Wall_Material: num  164 164 164 162 163 163 162 164 164 162 ...
 $ brick_wall   : num  1 1 1 0 0 0 0 1 1 0 ...
 $ count_Cow    : num  0 0 1 0 NA 2 NA NA 0 NA ...
 $ count_Buf    : num  0 1 0 0 NA 0 NA NA 0 NA ...
 $ own_bov      : num  0 1 1 0 NA 1 NA NA 0 NA ...
#View(hlp)

we Dont need countCow, count_Buf or Wall_Material

hlp <- subset(hlp, select=-c(count_Cow,count_Buf,Wall_Material))
str(hlp)
'data.frame':   13377 obs. of  3 variables:
 $ FSN       : num  45001 45002 45003 45004 45005 ...
 $ brick_wall: num  1 1 1 0 0 0 0 1 1 0 ...
 $ own_bov   : num  0 1 1 0 NA 1 NA NA 0 NA ...

2.4.2. Merging hlp2 to assets2

str(hlp)
'data.frame':   13377 obs. of  3 variables:
 $ FSN       : num  45001 45002 45003 45004 45005 ...
 $ brick_wall: num  1 1 1 0 0 0 0 1 1 0 ...
 $ own_bov   : num  0 1 1 0 NA 1 NA NA 0 NA ...
str(assets2)
'data.frame':   13377 obs. of  11 variables:
 $ FSN                : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ own_Radio          : num  1 1 0 0 0 0 0 0 0 0 ...
 $ own_Mobile         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Sewing_Machine : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Watch          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Pressure_Cooker: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Chairs         : num  1 0 0 0 1 1 0 0 1 0 ...
 $ own_Tables         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Bicycle        : num  1 1 1 0 0 1 0 0 1 0 ...
 $ own_Motor_Cycle    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Electric_Fan   : num  0 0 0 0 0 0 0 0 0 0 ...
assets3  <-  merge(assets2, hlp, all=TRUE, by = "FSN")
str(assets3)
'data.frame':   13377 obs. of  13 variables:
 $ FSN                : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ own_Radio          : num  1 1 0 0 0 0 0 0 0 0 ...
 $ own_Mobile         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Sewing_Machine : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Watch          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Pressure_Cooker: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Chairs         : num  1 0 0 0 1 1 0 0 1 0 ...
 $ own_Tables         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Bicycle        : num  1 1 1 0 0 1 0 0 1 0 ...
 $ own_Motor_Cycle    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Electric_Fan   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ brick_wall         : num  1 1 1 0 0 0 0 1 1 0 ...
 $ own_bov            : num  0 1 1 0 NA 1 NA NA 0 NA ...
#View(assets3)

2.4.3. Check NA values

The merging creates NA values (ex: in own_bov), but that values are because that household doesnt have a bovine animal, so it should be 0

Checking NA

table (assets3$own_bov)

   0    1 
2816 4331 
table (assets3$own_bov, useNA = "ifany")

   0    1 <NA> 
2816 4331 6230 
table (assets3$own_bov, useNA = "always")

   0    1 <NA> 
2816 4331 6230 
str(assets3)
'data.frame':   13377 obs. of  13 variables:
 $ FSN                : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ own_Radio          : num  1 1 0 0 0 0 0 0 0 0 ...
 $ own_Mobile         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Sewing_Machine : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Watch          : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Pressure_Cooker: num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Chairs         : num  1 0 0 0 1 1 0 0 1 0 ...
 $ own_Tables         : num  0 0 0 0 0 1 0 0 0 0 ...
 $ own_Bicycle        : num  1 1 1 0 0 1 0 0 1 0 ...
 $ own_Motor_Cycle    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ own_Electric_Fan   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ brick_wall         : num  1 1 1 0 0 0 0 1 1 0 ...
 $ own_bov            : num  0 1 1 0 NA 1 NA NA 0 NA ...

2.4.4. Replacing NA

assets3$own_bov[is.na(assets3$own_bov)]<- 0
#View(assets3)
table (assets3$own_bov, useNA = "always")

   0    1 <NA> 
9046 4331    0 

2.5. Use PCA to create asset_index

Create the asset index using the Principle component analysis (PCA) and then categorize it in 5 quintiles of wealth.

What is a Principle component analysis (PCA)?

Principal Component Analysis (PCA) is a statistical technique that is used to transform a large number of variables into a smaller number (or just 1) new variable. Is often used for making scores or index.

When we do a score/index we tend to put equal weight to every variable because we think that every variable is equal important. But is that true? Usually no. So in Principle component analysis we try to put a weight to every variable depending the variance. It is useful to also put in in order to know which first component we need to work. (Which variables are representatives and in what measure)

The idea behind PCA is to find the underlying patterns in the data using its variances. The analysis will create a first (principal) component, then a second, third, and so on. Each component captures a different aspect of the variation in the data, with the first component capturing the most variation, and subsequent components capturing progressively less.

Usually we use the first component and with that then create a weighted variable for each individual.

names(assets3)
 [1] "FSN"                 "own_Radio"           "own_Mobile"         
 [4] "own_Sewing_Machine"  "own_Watch"           "own_Pressure_Cooker"
 [7] "own_Chairs"          "own_Tables"          "own_Bicycle"        
[10] "own_Motor_Cycle"     "own_Electric_Fan"    "brick_wall"         
[13] "own_bov"            

2.5.1. PCA command

pcamod <- princomp(~own_Radio+own_Mobile+own_Sewing_Machine++own_Watch+own_Pressure_Cooker+own_Chairs+own_Tables+own_Bicycle+own_Motor_Cycle+own_Electric_Fan+brick_wall+ own_bov, cor=TRUE, data= assets3)

2.5.2. Inspect the component loadings of Comp.1

pcamod$loadings

Loadings:
                    Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
own_Radio            0.235  0.106  0.288  0.506  0.724         0.208       
own_Mobile           0.321  0.245 -0.106  0.173               -0.415 -0.498
own_Sewing_Machine   0.235 -0.274  0.302  0.125 -0.210 -0.799  0.203 -0.131
own_Watch            0.351                0.120               -0.199  0.427
own_Pressure_Cooker  0.310 -0.348  0.220                      -0.195  0.415
own_Chairs           0.357  0.183 -0.334        -0.198         0.388       
own_Tables           0.362        -0.333        -0.216  0.115  0.420       
own_Bicycle          0.246  0.501  0.137               -0.150 -0.448       
own_Motor_Cycle      0.296 -0.280  0.277 -0.205         0.309        -0.588
own_Electric_Fan     0.328 -0.299        -0.228         0.286 -0.159  0.101
brick_wall           0.215        -0.452 -0.540  0.563 -0.357              
own_bov                     0.523  0.485 -0.523                0.317  0.121
                    Comp.9 Comp.10 Comp.11 Comp.12
own_Radio                                         
own_Mobile          -0.289  0.508  -0.162         
own_Sewing_Machine                  0.140         
own_Watch           -0.676 -0.420                 
own_Pressure_Cooker  0.310  0.280  -0.573         
own_Chairs           0.132                 -0.711 
own_Tables           0.147                  0.696 
own_Bicycle          0.516 -0.379   0.145         
own_Motor_Cycle            -0.479  -0.186         
own_Electric_Fan            0.260   0.745         
brick_wall                                        
own_bov             -0.225  0.196                 

               Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
Proportion Var  0.083  0.083  0.083  0.083  0.083  0.083  0.083  0.083  0.083
Cumulative Var  0.083  0.167  0.250  0.333  0.417  0.500  0.583  0.667  0.750
               Comp.10 Comp.11 Comp.12
SS loadings      1.000   1.000   1.000
Proportion Var   0.083   0.083   0.083
Cumulative Var   0.833   0.917   1.000
Tip

We have to choose one model (one component) to select their scores

2.5.3. Extract the component scores

Create a variable named PC1 in assets3 from the scores un pcamod

assets3$PC1 <- pcamod$scores[ ,1]
#View(assets3)

2.5.4. Check the quintiles

quantile(assets3$PC1, probs = seq(0,1, 1/5))
         0%         20%         40%         60%         80%        100% 
-2.11118368 -1.66100172 -0.98271146 -0.06987735  1.45281527  7.18120558 

2.5.5. Create a new categorical variable ‘asset_index’ with values 1-5 by quintile

assets3$asset_index <- NA
assets3$asset_index[assets3$PC1 >= -2.11118368   & assets3$PC1 <=-1.66100172  ] <- "1"
assets3$asset_index[assets3$PC1 >-1.66100172    & assets3$PC1 <=-0.98271146   ] <- "2"
assets3$asset_index[assets3$PC1 >-0.98271146    & assets3$PC1 <=-0.06987735    ] <- "3"
assets3$asset_index[assets3$PC1 >-0.06987735     & assets3$PC1 <=1.45281527    ] <- "4"
assets3$asset_index[assets3$PC1 >1.45281527     & assets3$PC1 <=7.18120558  ] <- "5"

2.5.6. Drop unnecessary variables by making a subset

Or final dataset (with the Asset index) only need to have FSN and asset_index

assets4 <- subset(assets3, select= c(FSN,asset_index))
str(assets4)
'data.frame':   13377 obs. of  2 variables:
 $ FSN        : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ asset_index: chr  "4" "3" "2" "1" ...
table(assets4$asset_index)

   1    2    3    4    5 
3243 2254 2581 2632 2667 

3. Merging all the databases

3.1. Final Merging

(Questionnaire_1 + asset index + animals + Q1_B + Q1_Screening)

3.1.1 Opening Questionnaire_1

Questionnaire_1 <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Questionnaire_1.csv", sep=",", dec= ".")
str(Questionnaire_1)
'data.frame':   13377 obs. of  10 variables:
 $ FSN                    : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...

3.1.2 Merging with the assets

Quest1_assets4  <-  merge(assets4, Questionnaire_1, all=TRUE, by = "FSN")
str(Quest1_assets4)
'data.frame':   13377 obs. of  11 variables:
 $ FSN                    : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ asset_index            : chr  "4" "3" "2" "1" ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...

3.1.3 Merging with animals

str(animals)
'data.frame':   7147 obs. of  29 variables:
 $ FSN       : int  45001 45002 45003 45004 45006 45009 45021 45024 45025 45027 ...
 $ count_Goa : num  3 0 0 2 0 7 1 3 3 0 ...
 $ dist_Goa  : int  5 NA NA 5 NA 0 5 15 0 NA ...
 $ indor_Goa : num  1 0 0 1 0 1 1 1 1 0 ...
 $ daysin_Goa: num  120 0 0 180 0 360 150 90 90 0 ...
 $ count_Pou : num  1 0 0 0 0 0 0 0 0 2 ...
 $ dist_Pou  : int  0 NA NA NA NA NA NA NA NA 0 ...
 $ indor_Pou : num  1 0 0 0 0 0 0 0 0 1 ...
 $ daysin_Pou: num  365 0 0 0 0 0 0 0 0 360 ...
 $ count_Buf : num  0 1 0 0 0 0 0 0 0 0 ...
 $ dist_Buf  : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Buf: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Cow : num  0 0 1 0 2 0 3 0 0 0 ...
 $ dist_Cow  : int  NA NA 3 NA 0 NA 15 NA NA NA ...
 $ indor_Cow : num  0 0 1 0 1 0 0 0 0 0 ...
 $ daysin_Cow: num  0 0 210 0 360 0 0 0 0 0 ...
 $ count_Pig : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Pig  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Pig: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Dog : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Dog  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Dog: num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Oth : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Oth  : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Oth: num  0 0 0 0 0 0 0 0 0 0 ...
 - attr(*, "reshapeWide")=List of 5
  ..$ v.names: chr [1:4] "count" "dist" "indor" "daysin"
  ..$ timevar: chr "anim"
  ..$ idvar  : chr "FSN"
  ..$ times  : chr [1:7] "Goa" "Pou" "Buf" "Cow" ...
  ..$ varying: chr [1:4, 1:7] "count_Goa" "dist_Goa" "indor_Goa" "daysin_Goa" ...
str(Quest1_assets4)
'data.frame':   13377 obs. of  11 variables:
 $ FSN                    : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ asset_index            : chr  "4" "3" "2" "1" ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
Quest1_assets4_animals <-  merge(Quest1_assets4, animals, all=TRUE, by = "FSN")
str(Quest1_assets4_animals)
'data.frame':   13377 obs. of  39 variables:
 $ FSN                    : int  45001 45002 45003 45004 45005 45006 45007 45008 45009 45010 ...
 $ asset_index            : chr  "4" "3" "2" "1" ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
 $ count_Goa              : num  3 0 0 2 NA 0 NA NA 7 NA ...
 $ dist_Goa               : int  5 NA NA 5 NA NA NA NA 0 NA ...
 $ indor_Goa              : num  1 0 0 1 NA 0 NA NA 1 NA ...
 $ daysin_Goa             : num  120 0 0 180 NA 0 NA NA 360 NA ...
 $ count_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pou               : int  0 NA NA NA NA NA NA NA NA NA ...
 $ indor_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pou             : num  365 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Buf              : num  0 1 0 0 NA 0 NA NA 0 NA ...
 $ dist_Buf               : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Buf             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Cow              : num  0 0 1 0 NA 2 NA NA 0 NA ...
 $ dist_Cow               : int  NA NA 3 NA NA 0 NA NA NA NA ...
 $ indor_Cow              : num  0 0 1 0 NA 1 NA NA 0 NA ...
 $ daysin_Cow             : num  0 0 210 0 NA 360 NA NA 0 NA ...
 $ count_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pig               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pig             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Dog               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Dog             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Oth               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Oth             : num  0 0 0 0 NA 0 NA NA 0 NA ...
#View(Quest1_assets4_animals)

3.1.4 Merging with Q1_B

Q1_B <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_B.csv", sep=",", dec= ".")
str(Q1_B)
'data.frame':   13377 obs. of  40 variables:
 $ ID                 : int  1 2 3 4 5 6 7 8 9 10 ...
 $ FSN                : num  45001 45002 45003 45004 45005 ...
 $ Neem_Tree          : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Neem_Tree_Distance : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Size     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Age      : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Usage    : chr  "" "" "" "" ...
 $ Neem_Tree_Use_Other: chr  "" "" "" "" ...
 $ Bamboo_Tree        : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Bamboo_Tree_Dist   : num  3 10 16 17 5 4 15 3 1 10 ...
 $ Banana_Tree        : num  0 0 0 0 0 0 0 1 1 0 ...
 $ Banana_Tree_Dist   : num  -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
 $ Rice_Field         : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Rice_Field_Dist    : num  4 12 16 13 4 13 10 10 7 12 ...
 $ Perm_Water_Body    : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Perm_Wat_Body_Dist : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wat_Body_Mid_Point : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ No_Mosquito_Net    : num  0 2 1 0 0 0 0 0 1 0 ...
 $ Sprayed_2010       : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Sprayed_2009       : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Floor_Material     : num  153 153 153 153 153 153 153 153 155 153 ...
 $ Other_Floor_Mat    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Is_Floor_Damp      : num  1 1 1 1 1 1 1 1 0 1 ...
 $ Roof_Material      : num  161 159 158 156 156 156 156 158 158 156 ...
 $ Other_Roof_Material: num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wall_Material      : num  164 164 164 162 163 163 162 164 164 162 ...
 $ Other_Wall_Material: num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Windows_in_Room    : num  1 1 1 0 0 0 0 1 1 0 ...
 $ Granaries_in_HH    : num  1 1 1 0 1 1 0 1 1 0 ...
 $ Source_Drink_Water : num  167 167 92 92 92 92 92 92 92 92 ...
 $ Other_Src_Drink_Wat: num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Toilet_Facility    : num  177 177 177 177 177 177 177 177 177 177 ...
 $ Other_Toilet_Fac   : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Cooking_Fuel       : num  180 180 180 180 180 180 180 180 180 180 ...
 $ Other_Cooking_Fuel : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Source_Light       : num  182 182 182 182 182 182 182 182 182 182 ...
 $ Other_Source_Light : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Electricity_in_HH  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ No_Of_Rooms        : num  2 2 3 1 2 3 2 1 3 1 ...
 $ No_Sleeping_Rooms  : num  2 2 2 1 2 2 2 1 1 1 ...
#Merge
Quest1_assets4_animals_q1B <-  merge(Q1_B, Quest1_assets4_animals, all=TRUE, by = "FSN")
str(Quest1_assets4_animals_q1B)
'data.frame':   13377 obs. of  78 variables:
 $ FSN                    : num  45001 45002 45003 45004 45005 ...
 $ ID                     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Neem_Tree              : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Neem_Tree_Distance     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Size         : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Age          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Usage        : chr  "" "" "" "" ...
 $ Neem_Tree_Use_Other    : chr  "" "" "" "" ...
 $ Bamboo_Tree            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Bamboo_Tree_Dist       : num  3 10 16 17 5 4 15 3 1 10 ...
 $ Banana_Tree            : num  0 0 0 0 0 0 0 1 1 0 ...
 $ Banana_Tree_Dist       : num  -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
 $ Rice_Field             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Rice_Field_Dist        : num  4 12 16 13 4 13 10 10 7 12 ...
 $ Perm_Water_Body        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Perm_Wat_Body_Dist     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wat_Body_Mid_Point     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ No_Mosquito_Net        : num  0 2 1 0 0 0 0 0 1 0 ...
 $ Sprayed_2010           : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Sprayed_2009           : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Floor_Material         : num  153 153 153 153 153 153 153 153 155 153 ...
 $ Other_Floor_Mat        : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Is_Floor_Damp          : num  1 1 1 1 1 1 1 1 0 1 ...
 $ Roof_Material          : num  161 159 158 156 156 156 156 158 158 156 ...
 $ Other_Roof_Material    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wall_Material          : num  164 164 164 162 163 163 162 164 164 162 ...
 $ Other_Wall_Material    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Windows_in_Room        : num  1 1 1 0 0 0 0 1 1 0 ...
 $ Granaries_in_HH        : num  1 1 1 0 1 1 0 1 1 0 ...
 $ Source_Drink_Water     : num  167 167 92 92 92 92 92 92 92 92 ...
 $ Other_Src_Drink_Wat    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Toilet_Facility        : num  177 177 177 177 177 177 177 177 177 177 ...
 $ Other_Toilet_Fac       : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Cooking_Fuel           : num  180 180 180 180 180 180 180 180 180 180 ...
 $ Other_Cooking_Fuel     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Source_Light           : num  182 182 182 182 182 182 182 182 182 182 ...
 $ Other_Source_Light     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Electricity_in_HH      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ No_Of_Rooms            : num  2 2 3 1 2 3 2 1 3 1 ...
 $ No_Sleeping_Rooms      : num  2 2 2 1 2 2 2 1 1 1 ...
 $ asset_index            : chr  "4" "3" "2" "1" ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
 $ count_Goa              : num  3 0 0 2 NA 0 NA NA 7 NA ...
 $ dist_Goa               : int  5 NA NA 5 NA NA NA NA 0 NA ...
 $ indor_Goa              : num  1 0 0 1 NA 0 NA NA 1 NA ...
 $ daysin_Goa             : num  120 0 0 180 NA 0 NA NA 360 NA ...
 $ count_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pou               : int  0 NA NA NA NA NA NA NA NA NA ...
 $ indor_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pou             : num  365 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Buf              : num  0 1 0 0 NA 0 NA NA 0 NA ...
 $ dist_Buf               : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Buf             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Cow              : num  0 0 1 0 NA 2 NA NA 0 NA ...
 $ dist_Cow               : int  NA NA 3 NA NA 0 NA NA NA NA ...
 $ indor_Cow              : num  0 0 1 0 NA 1 NA NA 0 NA ...
 $ daysin_Cow             : num  0 0 210 0 NA 360 NA NA 0 NA ...
 $ count_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pig               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pig             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Dog               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Dog             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Oth               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Oth             : num  0 0 0 0 NA 0 NA NA 0 NA ...

3.1.5 Opening Q1_Screening.csv

Q1_Screening <- read.csv("C:/Users/pined/OneDrive - Universidad Nacional Mayor de San Marcos/Javier 2022/Belgica/AC2_DataAnalysis_ThWk/Material/Q1_Screening.csv", sep=",", dec= ".")
str(Q1_Screening)
'data.frame':   81214 obs. of  10 variables:
 $ FSN                         : int  45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
 $ member_id                   : int  1 2 3 4 5 6 1 2 3 4 ...
 $ member_age                  : int  50 48 25 22 0 11 35 33 17 15 ...
 $ member_sex                  : int  2 3 2 3 2 2 2 3 2 2 ...
 $ fever_gt_3_days             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ suffered_vl_since_2nd_survey: int  0 0 0 0 0 0 0 0 0 0 ...
 $ date_diagnosis              : chr  "" "" "" "" ...
 $ treatment_place             : chr  "-1" "-1" "-1" "-1" ...
 $ current_status              : int  0 0 4 0 0 0 0 0 4 0 ...
 $ datedis                     : int  NA NA NA NA NA NA NA NA NA NA ...

3.1.6 Merging with the merged database of households

str(Quest1_assets4_animals_q1B)
'data.frame':   13377 obs. of  78 variables:
 $ FSN                    : num  45001 45002 45003 45004 45005 ...
 $ ID                     : int  1 2 3 4 5 6 7 8 9 10 ...
 $ Neem_Tree              : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Neem_Tree_Distance     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Size         : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Age          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Usage        : chr  "" "" "" "" ...
 $ Neem_Tree_Use_Other    : chr  "" "" "" "" ...
 $ Bamboo_Tree            : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Bamboo_Tree_Dist       : num  3 10 16 17 5 4 15 3 1 10 ...
 $ Banana_Tree            : num  0 0 0 0 0 0 0 1 1 0 ...
 $ Banana_Tree_Dist       : num  -1 -1 -1 -1 -1 -1 -1 5 1 -1 ...
 $ Rice_Field             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Rice_Field_Dist        : num  4 12 16 13 4 13 10 10 7 12 ...
 $ Perm_Water_Body        : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Perm_Wat_Body_Dist     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wat_Body_Mid_Point     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ No_Mosquito_Net        : num  0 2 1 0 0 0 0 0 1 0 ...
 $ Sprayed_2010           : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Sprayed_2009           : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Floor_Material         : num  153 153 153 153 153 153 153 153 155 153 ...
 $ Other_Floor_Mat        : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Is_Floor_Damp          : num  1 1 1 1 1 1 1 1 0 1 ...
 $ Roof_Material          : num  161 159 158 156 156 156 156 158 158 156 ...
 $ Other_Roof_Material    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wall_Material          : num  164 164 164 162 163 163 162 164 164 162 ...
 $ Other_Wall_Material    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Windows_in_Room        : num  1 1 1 0 0 0 0 1 1 0 ...
 $ Granaries_in_HH        : num  1 1 1 0 1 1 0 1 1 0 ...
 $ Source_Drink_Water     : num  167 167 92 92 92 92 92 92 92 92 ...
 $ Other_Src_Drink_Wat    : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Toilet_Facility        : num  177 177 177 177 177 177 177 177 177 177 ...
 $ Other_Toilet_Fac       : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Cooking_Fuel           : num  180 180 180 180 180 180 180 180 180 180 ...
 $ Other_Cooking_Fuel     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Source_Light           : num  182 182 182 182 182 182 182 182 182 182 ...
 $ Other_Source_Light     : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Electricity_in_HH      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ No_Of_Rooms            : num  2 2 3 1 2 3 2 1 3 1 ...
 $ No_Sleeping_Rooms      : num  2 2 2 1 2 2 2 1 1 1 ...
 $ asset_index            : chr  "4" "3" "2" "1" ...
 $ panchyat_id            : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id             : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no           : int  21 680 290 3390 181 3501 1401 2330 3750 2371 ...
 $ household_head_age     : int  50 35 60 35 40 40 60 23 25 40 ...
 $ household_head_sex     : int  2 2 2 2 2 2 2 2 2 3 ...
 $ household_head_religion: int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste   : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste: chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
 $ count_Goa              : num  3 0 0 2 NA 0 NA NA 7 NA ...
 $ dist_Goa               : int  5 NA NA 5 NA NA NA NA 0 NA ...
 $ indor_Goa              : num  1 0 0 1 NA 0 NA NA 1 NA ...
 $ daysin_Goa             : num  120 0 0 180 NA 0 NA NA 360 NA ...
 $ count_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pou               : int  0 NA NA NA NA NA NA NA NA NA ...
 $ indor_Pou              : num  1 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pou             : num  365 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Buf              : num  0 1 0 0 NA 0 NA NA 0 NA ...
 $ dist_Buf               : int  NA 4 NA NA NA NA NA NA NA NA ...
 $ indor_Buf              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Buf             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Cow              : num  0 0 1 0 NA 2 NA NA 0 NA ...
 $ dist_Cow               : int  NA NA 3 NA NA 0 NA NA NA NA ...
 $ indor_Cow              : num  0 0 1 0 NA 1 NA NA 0 NA ...
 $ daysin_Cow             : num  0 0 210 0 NA 360 NA NA 0 NA ...
 $ count_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Pig               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Pig             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Dog               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Dog             : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ count_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ dist_Oth               : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth              : num  0 0 0 0 NA 0 NA NA 0 NA ...
 $ daysin_Oth             : num  0 0 0 0 NA 0 NA NA 0 NA ...
str(Q1_Screening)
'data.frame':   81214 obs. of  10 variables:
 $ FSN                         : int  45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
 $ member_id                   : int  1 2 3 4 5 6 1 2 3 4 ...
 $ member_age                  : int  50 48 25 22 0 11 35 33 17 15 ...
 $ member_sex                  : int  2 3 2 3 2 2 2 3 2 2 ...
 $ fever_gt_3_days             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ suffered_vl_since_2nd_survey: int  0 0 0 0 0 0 0 0 0 0 ...
 $ date_diagnosis              : chr  "" "" "" "" ...
 $ treatment_place             : chr  "-1" "-1" "-1" "-1" ...
 $ current_status              : int  0 0 4 0 0 0 0 0 4 0 ...
 $ datedis                     : int  NA NA NA NA NA NA NA NA NA NA ...
#Merge
Quest1_assets4_animals_q1B_Persons <-  merge(Q1_Screening, Quest1_assets4_animals_q1B, all=TRUE, by = "FSN")
str(Quest1_assets4_animals_q1B_Persons)
'data.frame':   81214 obs. of  87 variables:
 $ FSN                         : int  45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
 $ member_id                   : int  1 2 3 4 5 6 1 2 3 4 ...
 $ member_age                  : int  50 48 25 22 0 11 35 33 17 15 ...
 $ member_sex                  : int  2 3 2 3 2 2 2 3 2 2 ...
 $ fever_gt_3_days             : int  0 0 0 0 0 0 0 0 0 0 ...
 $ suffered_vl_since_2nd_survey: int  0 0 0 0 0 0 0 0 0 0 ...
 $ date_diagnosis              : chr  "" "" "" "" ...
 $ treatment_place             : chr  "-1" "-1" "-1" "-1" ...
 $ current_status              : int  0 0 4 0 0 0 0 0 4 0 ...
 $ datedis                     : int  NA NA NA NA NA NA NA NA NA NA ...
 $ ID                          : int  1 1 1 1 1 1 2 2 2 2 ...
 $ Neem_Tree                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Neem_Tree_Distance          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Size              : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Age               : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Neem_Tree_Usage             : chr  "" "" "" "" ...
 $ Neem_Tree_Use_Other         : chr  "" "" "" "" ...
 $ Bamboo_Tree                 : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Bamboo_Tree_Dist            : num  3 3 3 3 3 3 10 10 10 10 ...
 $ Banana_Tree                 : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Banana_Tree_Dist            : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Rice_Field                  : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Rice_Field_Dist             : num  4 4 4 4 4 4 12 12 12 12 ...
 $ Perm_Water_Body             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Perm_Wat_Body_Dist          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wat_Body_Mid_Point          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ No_Mosquito_Net             : num  0 0 0 0 0 0 2 2 2 2 ...
 $ Sprayed_2010                : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Sprayed_2009                : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Floor_Material              : num  153 153 153 153 153 153 153 153 153 153 ...
 $ Other_Floor_Mat             : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Is_Floor_Damp               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Roof_Material               : num  161 161 161 161 161 161 159 159 159 159 ...
 $ Other_Roof_Material         : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Wall_Material               : num  164 164 164 164 164 164 164 164 164 164 ...
 $ Other_Wall_Material         : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Windows_in_Room             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Granaries_in_HH             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Source_Drink_Water          : num  167 167 167 167 167 167 167 167 167 167 ...
 $ Other_Src_Drink_Wat         : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Toilet_Facility             : num  177 177 177 177 177 177 177 177 177 177 ...
 $ Other_Toilet_Fac            : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Cooking_Fuel                : num  180 180 180 180 180 180 180 180 180 180 ...
 $ Other_Cooking_Fuel          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Source_Light                : num  182 182 182 182 182 182 182 182 182 182 ...
 $ Other_Source_Light          : num  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
 $ Electricity_in_HH           : num  0 0 0 0 0 0 0 0 0 0 ...
 $ No_Of_Rooms                 : num  2 2 2 2 2 2 2 2 2 2 ...
 $ No_Sleeping_Rooms           : num  2 2 2 2 2 2 2 2 2 2 ...
 $ asset_index                 : chr  "4" "4" "4" "4" ...
 $ panchyat_id                 : int  2 2 2 2 2 2 2 2 2 2 ...
 $ village_id                  : int  11 11 11 11 11 11 11 11 11 11 ...
 $ ward_no                     : int  1 1 1 1 1 1 1 1 1 1 ...
 $ household_no                : int  21 21 21 21 21 21 680 680 680 680 ...
 $ household_head_age          : int  50 50 50 50 50 50 35 35 35 35 ...
 $ household_head_sex          : int  2 2 2 2 2 2 2 2 2 2 ...
 $ household_head_religion     : int  4 4 4 4 4 4 4 4 4 4 ...
 $ household_head_caste        : int  9 9 9 9 9 9 9 9 9 9 ...
 $ household_head_subcaste     : chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
 $ count_Goa                   : num  3 3 3 3 3 3 0 0 0 0 ...
 $ dist_Goa                    : int  5 5 5 5 5 5 NA NA NA NA ...
 $ indor_Goa                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ daysin_Goa                  : num  120 120 120 120 120 120 0 0 0 0 ...
 $ count_Pou                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ dist_Pou                    : int  0 0 0 0 0 0 NA NA NA NA ...
 $ indor_Pou                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ daysin_Pou                  : num  365 365 365 365 365 365 0 0 0 0 ...
 $ count_Buf                   : num  0 0 0 0 0 0 1 1 1 1 ...
 $ dist_Buf                    : int  NA NA NA NA NA NA 4 4 4 4 ...
 $ indor_Buf                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Buf                  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Cow                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Cow                    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Cow                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Cow                  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Pig                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Pig                    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Pig                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Pig                  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Dog                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Dog                    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Dog                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Dog                  : num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Oth                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ dist_Oth                    : int  NA NA NA NA NA NA NA NA NA NA ...
 $ indor_Oth                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ daysin_Oth                  : num  0 0 0 0 0 0 0 0 0 0 ...

3.2. Select variables of interest

For the final dataset

final_dataset  <- subset(Quest1_assets4_animals_q1B_Persons, select=c(
  FSN, asset_index, Bamboo_Tree, Banana_Tree, Cooking_Fuel, Floor_Material, Granaries_in_HH, household_head_subcaste, indor_Buf, indor_Cow, indor_Pou, indor_Goa, count_Cow, count_Buf, count_Goa, count_Pou, Is_Floor_Damp, member_age, member_sex, Neem_Tree, No_Mosquito_Net, Perm_Water_Body, Rice_Field, Roof_Material, Sprayed_2009, Sprayed_2010, suffered_vl_since_2nd_survey, Wall_Material, Windows_in_Room
))

str(final_dataset)
'data.frame':   81214 obs. of  29 variables:
 $ FSN                         : int  45001 45001 45001 45001 45001 45001 45002 45002 45002 45002 ...
 $ asset_index                 : chr  "4" "4" "4" "4" ...
 $ Bamboo_Tree                 : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Banana_Tree                 : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Cooking_Fuel                : num  180 180 180 180 180 180 180 180 180 180 ...
 $ Floor_Material              : num  153 153 153 153 153 153 153 153 153 153 ...
 $ Granaries_in_HH             : num  1 1 1 1 1 1 1 1 1 1 ...
 $ household_head_subcaste     : chr  "PASWAN" "PASWAN" "PASWAN" "PASWAN" ...
 $ indor_Buf                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ indor_Cow                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ indor_Pou                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ indor_Goa                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ count_Cow                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ count_Buf                   : num  0 0 0 0 0 0 1 1 1 1 ...
 $ count_Goa                   : num  3 3 3 3 3 3 0 0 0 0 ...
 $ count_Pou                   : num  1 1 1 1 1 1 0 0 0 0 ...
 $ Is_Floor_Damp               : num  1 1 1 1 1 1 1 1 1 1 ...
 $ member_age                  : int  50 48 25 22 0 11 35 33 17 15 ...
 $ member_sex                  : int  2 3 2 3 2 2 2 3 2 2 ...
 $ Neem_Tree                   : num  0 0 0 0 0 0 0 0 0 0 ...
 $ No_Mosquito_Net             : num  0 0 0 0 0 0 2 2 2 2 ...
 $ Perm_Water_Body             : num  0 0 0 0 0 0 0 0 0 0 ...
 $ Rice_Field                  : num  1 1 1 1 1 1 1 1 1 1 ...
 $ Roof_Material               : num  161 161 161 161 161 161 159 159 159 159 ...
 $ Sprayed_2009                : num  86 86 86 86 86 86 86 86 86 86 ...
 $ Sprayed_2010                : num  86 86 86 86 86 86 86 86 86 86 ...
 $ suffered_vl_since_2nd_survey: int  0 0 0 0 0 0 0 0 0 0 ...
 $ Wall_Material               : num  164 164 164 164 164 164 164 164 164 164 ...
 $ Windows_in_Room             : num  1 1 1 1 1 1 1 1 1 1 ...

We are going to save this final data set

saveRDS(final_dataset, "final_dataset.RDS")